The Correctness of the Definite Assignment Analysis in C#
نویسنده
چکیده
syntax tree. For example, we consider that true||b is replaced by true in the following if statement: if (true||b) i = 1; else { int j = i; } Although the new test (i.e. true) cannot evaluate to false, we still add to the graph the edge (F(β),B(γ)): however the false point of true is never reachable (see Table 4). In the presence of finally blocks the jump statements goto, continue and break bring more complexity to the graph. When a jump statement exits a try block, control is transferred first to the innermost finally block. If control reaches the end point of that finally block, then it is transferred to the next innermost finally block and so on. If control reaches the end point of the outermost finally block, then it is transferred to the target of the jump statement. For these control transfers we have special edges in our graph. But one needs to take care of some important details: these special edges cannot be used for paths other than those which connect the jump statement with its target. In other words, if a path uses such an edge, then necessarily the path contains the entry point of the jump statement. For this reason, we say that an edge e is conditioned by a point i with the meaning that e can be used only in paths that contain i . If we do not make this restriction, then [B(mb)B(α1)B(α2)B(α3)B(α4)B(α5)A(α5)B(α6)] would be a possible execution path to the labeled statement in the following method body 1try α2 { 3( 4(i = 3/j);) goto L; } finally 5{} 6L:Console.WriteLine(i); in case the evaluation of α4 would throw an exception. But this does not match the control transfer described in the C] Specification. The following sets introduce the above described edges. If α and β are two statements and Fin(α, β) is the list [γ1, . . . , γn], then the set ThroughFinb(α, β) consists of the edges (B(α),B(γ1)), (A(γn),B(β)), (A(γi),B(γi+1)), i = 1, n− 1, all conditioned by B(α), while the set ThroughFina(α, β) contains the edges (B(α),B(γ1)), (A(γn),A(β)), (A(γi),B(γi+1)), i = 1, n− 1 all conditioned by B(α). If Fin(α, β) is empty, then the set ThroughFinb(α, β) contains only the edge (B(α),B(β)) while ThroughFina(α, β) refers to the edge (B(α),A(β)). In the previous example the list Fin(α, α6) is given by [α5] while the set ThroughFinb(α, α6) contains the edges (B(α),B(α5)), (A(α5),B(α6)) conditioned by B(α). Note that in Table 6, for goto and continue, the set of edges ThroughFinb is added to the graph, since after executing the finally blocks control is transferred to the entry point of the labeled statement and while statement, respectively. However, in case of break the set ThroughFina is considered since at the end control is VOL 3, NO. 9 JOURNAL OF OBJECT TECHNOLOGY 43 THE CORRECTNESS OF THE DEFINITE ASSIGNMENT ANALYSIS IN C] transferred to the end point of the while statement. There are two more remarks concerning the try statement. First of all, since a reason for abruption (e.g. an exception) can occur anytime in a try block, we should have edges from every point in a try block to: every associated catch block, every catch of enclosing try statements (if the catch clause matches the type of the exception) and to every associated finally block (if none of the catch clauses matches the type of the exception). We do not consider all these edges since the definite assignment analysis is an “over all paths” analysis. It is equivalent to consider only one edge to the entry points of the catch and finally blocks — from the entry point of the try block (see Table 6). The next remark concerns the end point A(α) of a try-finally statement α. The C] Specification states in [§8.10] that A(α) is reachable only if both end points of the try block β and finally block γ are reachable. The only edge to A(α) is (A(γ),A(α)) and we know that the finally block can be reached either through a jump or through a normal completion of the try block. In the case of a jump, if control reaches the end point A(γ) of the finally, then it is transferred further to the target of the statement which generated the jump and not to A(α). This means that all paths to A(α) contain also the end point A(β) of the try block. That is why we require that the edge (A(γ),A(α)) is conditioned by A(β) (see Table 6) — otherwise in the following example A(α) would be reachable in our graph (under the assumption that B(α) is reachable): α try { goto L; } finally γ {} Therefore we will not consider all the paths in the graph but only the valid paths, that is the paths p for which the following is true: if p uses a conditioned edge, then it contains also the point which conditions the edge. Formally: valid([α1, . . . , αn]) ≡ for every conditioned edge (αi, αi+1), ∃ j < i such that (αi, αi+1) is conditioned by αj If α is an expression or a statement, then pathb(α) is the set of all valid paths from the entry point of the method body B(mb) to the entry point B(α) of α: pathb(α) = {[α1, . . . , αn] | α1 = B(mb), αn = B(α), (αi, αi+1) ∈ CFG , i = 1, n− 1 and valid([α1, . . . , αn])} Similarly patha(α) is the set of all valid paths from the entry point of the method body B(mb) to the end point A(α) of α, while if α is a boolean expression, patht(α) and pathf (α) are the sets of all valid paths from B(mb) to the true point T (α) and to the false point F(α) of α, respectively. In the proofs in the next section we use the following two notations. If p is a path, then p[i, j] is the subpath of p which connects the point i with the point j. Also over the set of all paths we consider the operation ⊕ to be path concatenation (defined also for infinite paths). 44 JOURNAL OF OBJECT TECHNOLOGY VOL 3, NO. 9 5 CORRECTNESS OF THE ANALYSIS 5 CORRECTNESS OF THE ANALYSIS We prove that when a C] compiler relies on the setsMFP b, MFPa, MFP t andMFP f derived from the maximal fixed point of the data flow equations in Section 2 (or on their expanded sets if we allow struct type variables), then all accesses to the value of a local variable occur after it is initialized. In other words, the correctness of the analysis means that if a local variable is in one of the four sets — that is the analysis infers the variable as definitely assigned at a certain program point — then this variable will actually be assigned at that point during every execution path of the program. A variable loc is assigned on a path if the path contains an initialization of loc or a catch clause whose exception variable is loc. We describe in the following definition what we mean by initialization. Definition 2 A path p contains an initialization of a local variable loc if at least one of the following is true: • p contains a simple assignment (not a compound assignment) to loc, or • p contains a method invocation for which loc is an out argument. Struct type variables. The definition above has to be extended if we also want to allow variables of struct types. Thus a path p contains an initialization of a local variable loc also in one of the following cases: • loc is an instance field of a struct type variable x and p contains an initialization of x , or • loc is of a struct type and p contains initializations for each instance field of loc. We prove actually more than the correctness. We show that the components of the maximal fixed point MFP are exactly the sets of variables which are assigned on every possible execution path to the appropriate point (and not only a safeapproximation). In order to formalize this we define the following sets. If α is an arbitrary expression or statement, then APb(α) denotes the set of local variables in vars(α) (the variables in the scope of which α is) assigned on every path in pathb(α): APb(α) = {x ∈ vars(α) | x is assigned on every path p ∈ pathb(α)} APa(α) is the set of variables in vars(α) which are assigned on every path in patha(α), while for a boolean expression α the sets APt(α) and APf (α) are defined similarly as above, but with respect to paths in patht(α) and pathf (α), respectively. Struct type variables. If we consider also variables of struct types, the definition of “is assigned on” is extended as pointed out above. The definitions of the sets APb, APa, APt and APf are also adapted. But considering the new definition for “is assigned”, one can easily observe that the definitions of the AP sets to include struct type variables are nothing else than their expanded sets. So actually the same “expansion” function we mentioned in Section 3 is applied also to the AP sets in order to include struct type variables. VOL 3, NO. 9 JOURNAL OF OBJECT TECHNOLOGY 45 THE CORRECTNESS OF THE DEFINITE ASSIGNMENT ANALYSIS IN C] The following result is used to prove Lemma 5. Lemma 4 For every expression or statement α, if MFPb(α) ⊆ vars(α) holds, then we have MFPa(α) ⊆ vars(α). Moreover, if α is a boolean expression, then we have also MFPt(α) ⊆ vars(α) and MFPf (α) ⊆ vars(α). Proof. The proof proceeds by induction over the structure of expressions and statements. Thus, we first prove the base cases of the induction, i.e. the above stated implications for all possible leaves of the abstract syntax tree (AST) of our method body. The expressions which are leaves in the AST are the following: true, false, loc, lit and c.f . Since MFP is in particular a solution of the data flow equations, it is obvious that the implications stated in our lemma are satisfied. The statements considered leaves in the AST are the empty-statement, goto L, break, continue, return and throw. For the last five, from the equations above we obviously have MFPa(α) ⊆ vars(α). For the empty-statement this is true as well since our hypothesis is MFPb(α) ⊆ vars(α). In the induction step the implications for each expression and statement are proved under the assumption that their “children” (subexpressions/substatements) satisfy the implications. u t The next lemma is used in the proof of the correctness theorem (Theorem 1). It claims that the MFP sets of an expression or statement α consist of variables in the scope of which α is located. Lemma 5 For every expression or statement α, we have MFPb(α) ⊆ vars(α) and MFPa(α) ⊆ vars(α). Moreover, if α is a boolean expression, then also we have MFPt(α) ⊆ vars(α) and MFPf (α) ⊆ vars(α). Proof. We show the above inclusions for all expressions and statements by an induction over the AST, starting at the root, i.e. the method body (the basis of induction). Notice that the induction schema is in the opposite direction compared to that in Lemma 4. Therefore the induction step is: under the assumption that a node of the AST satisfies the inclusions, all its “children” (subexpressions/substatements) satisfy the inclusions as well. According to Lemma 4 it is enough to prove for all labels α: MFPb(α) ⊆ vars(α). For our method body this is trivial: the relation MFPb(mb) ⊆ vars(mb) holds since MFPb(mb) = vars(mb) = ∅. Lemma 4 is used again in the next step of the proof, which consists in showing for each expression and statement that under the assumption MFPb(α) ⊆ vars(α), each of its direct subexpressions/substatements β satisfies MFPb(β) ⊆ vars(β). u t The correctness of the definite assignment analysis in C] is proved in the following theorem, which claims that the analysis is a safe approximation. Theorem 1 (safe approximation) For every expression or statement α, the following relations are true: MFPb(α) ⊆ APb(α) and MFPa(α) ⊆ APa(α). Moreover, if α is a boolean expression, then we have also MFPt(α) ⊆ APt(α) and MFPf (α) ⊆ APf (α). 46 JOURNAL OF OBJECT TECHNOLOGY VOL 3, NO. 9 5 CORRECTNESS OF THE ANALYSIS Proof. We consider the following definitions. The set APb (α) is defined in the same way as APb(α), except that we consider only paths of length less than n. Similarly, we also define the sets APa(α), AP n t (α), AP n f (α) (analogously, we have definitions for the sets of paths pathnb , path n a , path n t , path n f ). According to these definitions, the following set equalities hold for every α:
منابع مشابه
Modeling the .NET CLR Exception Handling Mechanism for a Mathematical Analysis
This work is part of a larger project [17] which aims at establishing some important properties of C] and CLR by mathematical proofs. Examples are the correctness of the bytecode verifier of CLR [11], the type safety (along the lines of the first author’s correctness proof [14, 15] for the definite assignment rules) of C], the correctness of a general compilation scheme. We reuse the method dev...
متن کاملInquiry into Scientific Correctness of the Elementary School Farsi Textbooks
Inquiry into Scientific Correctness of the Elementary School Farsi Textbooks H. Ghamari, Ph.D.* Scientific validity or correctness is of paramount importance when it comes to the contents of textbooks used in elementary schools, as the very foundations of learners’ knowledge is constructed therein. Hence, an analysis of these contents from this perspective is necessary in order to guide...
متن کاملAssignment problem and its application in Nigerian institutions: Hungarian method approach
Assignment model is a powerful operations research techniques that can be used to solve assignment or allocation problem. This study applies the assignment model to the course allocation problem in Nigeria tertiary institution in order to maximize lecturers’ effectiveness. A well-structured questionnaire was used to obtain data from lecturers and solved with Hungarian method. The study revealed...
متن کاملAn algorithm for integrated worker assignment, mixed-model two-sided assembly line balancing and bottleneck analysis
This paper addresses a multi-objective mixed-model two-sided assembly line balancing and worker assignment with bottleneck analysis when the task times are dependent on the worker’s skill. This problem is known as NP-hard class, thus, a hybrid cyclic-hierarchical algorithm is presented for solving it. The algorithm is based on Particle Swarm Optimization (PSO) and Theory of Constraints (TOC) an...
متن کاملDiscounting of Letters of Credit; a Legal Analysis
Letter of Credit is an international payment instrument whereby the issuing bank undertakes to pay the beneficiary, against presentation of certain stipulated documents, according to the conditions of the Letter of Credit. Discounting of LC for the short-term financing of the seller, due to the independent and irrevocable undertaking of the bank to make payment, is prevalent. Beneficiary gets t...
متن کاملThe Tchebycheff Norm for Ranking DMUs in Cellular Manufacturing Systems with Assignment Worker
This paper develops an integer mathematical programming model to design the cellular manufacturing systems under data envelopment analysis. Since workers have an important role in doing jobs on machines, assignment of workers to cells becomes a crucial factor for fully utilization of cellular manufacturing systems (CMS). The aim of the proposed is to minimize backorder costs and intercellul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Object Technology
دوره 3 شماره
صفحات -
تاریخ انتشار 2004